Homework 6
Important:
All answers need to be round to 3 decimals, except the last problem, which needs the accurate answer.
In this question, we will train a Naive Bayes classifier to predict class labels Y as a function of input features .
We are given the following 15 training points:
What is the maximum likelihood estimate of the prior P(Y)?
Y | P(Y) |
A | [q1.1] |
B | [q1.2] |
C | [q1.3] |
What are the maximum likelihood estimates of the conditional probability distributions? Fill in the tables below (the second and third are done for you).
Y | ||
0 | A | [q1.4] |
1 | A | [q1.5] |
0 | B | [q1.6] |
1 | B | [q1.7] |
0 | C | [q1.8] |
1 | C | [q1.9] |
Y | ||
0 | A | 1.000 |
1 | A | 0.000 |
0 | B | 0.222 |
1 | B | 0.778 |
0 | C | 0.250 |
1 | C | 0.750 |
Y | ||
0 | A | 0.500 |
1 | A | 0.500 |
0 | B | 0.000 |
1 | B | 1.000 |
0 | C | 0.500 |
1 | C | 0.500 |
Following question 1, Now consider a new data point . Use your classifier to determine the joint probability of causes Y and this new data point, along with the posterior probability of Y given the new data:
Y | |
A | [q2.1] |
B | [q2.2] |
C | [q2.3] |
Y | |
A | [q2.4] |
B | [q2.5] |
C | [q2.6] |
What label does your classifier give to the new data point? (Break ties alphabetically). Enter capital letters only
[q2.7]
The training data is repeated here for your convenience:
Following the previous questions, now use Laplace Smoothing with strength k = 3 to estimate the prior P(Y) for the same data.
Y | P(Y) |
A | [q3.1] |
B | [q3.2] |
C | [q3.3] |
Use Laplace Smoothing with strength k = 3 to estimate the conditional probability distributions below (again, the second two are done for you).
Y | ||
0 | A | [q3.4] |
1 | A | [q3.5] |
0 | B | [q3.6] |
1 | B | [q3.7] |
0 | C | [q3.8] |
1 | C | [q3.9] |
Y | ||
0 | A | 0.625 |
1 | A | 0.375 |
0 | B | 0.333 |
1 | B | 0.667 |
0 | C | 0.400 |
1 | C | 0.600 |
Y | ||
0 | A | 0.500 |
1 | A | 0.500 |
0 | B | 0.200 |
1 | B | 0.800 |
0 | C | 0.500 |
1 | C | 0.500 |
Now consider again the new data point . Use the Laplace-Smoothed version of your classifier to determine the joint probability of causes Y and this new data point, along with the posterior probability of Y given the new data:
Y | |
A | [q4.1] |
B | [q4.2] |
C | [q4.3] |
Y | |
A | [q4.4] |
B | [q4.5] |
C | [q4.6] |
What label does your (Laplace-Smoothed) classifier give to the new data point? (Break ties alphabetically). Enter a single capital letter.
[q4.7]
When training a classifier, it is common to split the available data into a training set, a hold-out set, and a test set, each of which has a different role.
Which data set is used to learn the conditional probabilities?
Which data set is used to tune the Laplace Smoothing hyperparameters?
Which data set is used for quantifying performance results?
Consider a context-free grammar with the following rules (assume that S is the start symbol):
S → NP VP
NP → DT NN
NP → NP PP
PP → IN NP
VP → VB NP
DT → the
NN → man
NN → dog
NN → cat
NN → park
VB → saw
IN → in
IN → with
IN → under
How many parse trees are there under this grammar for the sentence: the man saw the dog in the park?
Following the previous question, How many parse trees for the sentence: the man saw the dog in the park with the cat?
The K-means algorithm:
Consider the following PCFG (probabilities for each rule are shown after the rule):
S → NP VP 1.0
PP → P NP 1.0
VP → V NP 0.6
VP → VP PP 0.4
P → with 0.8
P → in 0.2
V → saw 0.7
V → look 0.3
NP → NP PP 0.3
NP → Astronomers 0.12
NP → ears 0.18
NP → saw 0.02
NP → stars 0.18
NP → telescopes 0.2
What is the probability of the best parse tree for the sentence: Astronomers saw stars with ears?